Semantic Similarity Measure Using Relational and Latent Topic Features
نویسندگان
چکیده
Computing the semantic similarity between words is one of the key challenges in many language-based applications. Previous work tends to use the contextual information of words to disclose the degree of their similarity. In this paper, we consider the relationships between words in local contexts as well as latent topic information of words to propose a new distributed representation of words for semantic similarity measure. The method models meanings of a word as high dimensional Vector Space Models (VSMs) which combine relational features in word local contexts and its latent topic features in the global sense. Our experimental results on popular semantic similarity datasets show significant improvement of correlation scores with human judgements in comparison with other methods using purely plain texts.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملA Semantic Feature for Relation Recognition Using a Web-based Corpus
Selecting appropriate features to represent an entity pair plays a key role in the task of relation recognition. However, existing syntactic features or lexical features cannot capture the interaction between two entities because of the dearth of annotated relational corpus specialized for relation recognition. In this paper, we propose a semantic feature, called the latent topic feature, which...
متن کاملCombination Features for Semantic Similarity Measure
Computing the semantic similarity between words is one of the key tasks in many language-based applications. Recent work has focused on using contextual clues for semantic similarity computation. In this paper, we propose a method to the measure semantic similarity between words using plain text contents. It takes into account information attributes (local) and topic information (global) of wor...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملEnglish and Chinese Bilingual Topic Aspect Classification: Exploring Similarity Measures, Optimal LSA Dimensions, and Centroid Correction of Translated Training Examples
This paper explores topic aspect (i.e., subtopic or facet) classification for collections that contain more than one language (in this case, English and Chinese), and investigates several key technical issues that may affect the classification effectiveness. The evaluation model assumes a bilingual user who has found some documents on a topic and identified a few passages in each language on sp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Comput. Linguistics Appl.
دوره 5 شماره
صفحات -
تاریخ انتشار 2014